首页> 外文OA文献 >A New Method for Species Identification via Protein-Coding and Non-Coding DNA Barcodes by Combining Machine Learning with Bioinformatic Methods
【2h】

A New Method for Species Identification via Protein-Coding and Non-Coding DNA Barcodes by Combining Machine Learning with Bioinformatic Methods

机译:机器学习与生物信息学相结合的蛋白质编码和非编码DNA条形码识别物种的新方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75–100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62–98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60–99.37%) for 1094 brown algae queries, both using ITS barcodes.
机译:通过DNA条形码进行物种鉴定正在极大地促进当前的生物清单工作。最初的且被广泛接受的建议是使用蛋白质编码的细胞色素C氧化酶亚基I(COI)区作为动物的标准条形码,但最近有人提出将非编码的内部转录间隔区(ITS)基因用作候选的条形码。动植物。然而,对于非编码区域实现鲁棒对准可能是有问题的。在这里,我们提出了两种新方法(DV-RBF和FJ-RBF),以利用编码序列和非编码序列利用机器学习和生物信息学的力量来解决物种分配问题。我们用四个经验数据集证明了新方法的价值,两个数据集代表典型的蛋白质编码COI条码数据集(正直蝙蝠和海水鱼类),两个代表非编码ITS条码(锈菌和褐藻)。使用两种随机子采样方法,我们证明了在参考数据集中完整的物种覆盖范围时,新方法在编码和非编码条形码方面均明显优于现有的邻居连接(NJ)和最大似然(ML)方法。在潜在的物种覆盖范围不完全的情况下,新方法也优于非编码序列的NJ和ML方法,尽管那时NJ和ML方法的性能比蛋白质编码条形码的新方法稍好。使用COI条形码对4122个蝙蝠查询和5134个鱼类查询使用两种新方法,物种识别成功率达到100%,95%置信区间(CI)为99.75–100%。新方法还通过ITS获得了484个锈菌查询的成功率96.29%(95%CI:91.62–98.40%)和1094褐藻查询的成功率98.50%(95%CI:96.60–99.37%)。条码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号